<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_ufw_zn2_jr">
  <title>Auditing</title>
  <shortdesc>Describes Greenplum Database events that are logged and should be monitored to detect
    security threats.</shortdesc>
  <body>
    <p>Greenplum Database is capable of auditing a variety of events, including startup and shutdown
      of the system, segment database failures, SQL statements that result in an error, and all
      connection attempts and disconnections. Greenplum Database also logs SQL statements and
      information regarding SQL statements, and can be configured in a variety of ways to record
      audit information with more or less detail. The <codeph>log_error_verbosity</codeph>
      configuration parameter controls the amount of detail written in the server log for each
      message that is logged.  Similarly, the <codeph>log_min_error_statement</codeph> parameter
      allows administrators to configure the level of detail recorded specifically for SQL
      statements, and the <codeph>log_statement</codeph> parameter determines the kind of SQL
      statements that are audited. Greenplum Database records the username for all auditable events,
      when the event is initiated by a subject outside the Greenplum Database.</p>
    <p>Greenplum Database prevents unauthorized modification and deletion of audit records by only
      allowing administrators with an appropriate role to perform any operations on log files.  Logs
      are stored in a proprietary format using comma-separated values (CSV).  Each segment and the
      master stores its own log files, although these can be accessed remotely by an administrator. 
      Greenplum Database also authorizes overwriting of old log files via the
        <codeph>log_truncate_on_rotation</codeph> parameter.  This is a local parameter and must be
      set on each segment and master configuration file.</p>
    <p>Greenplum provides an administrative schema called <codeph>gp_toolkit</codeph> that you can
      use to query log files, as well as system catalogs and operating environment for system status
      information. For more information, including usage, refer to <i>The gp_tookit Administrative
        Schema</i> appendix in the <i>Greenplum Database Reference Guide</i>.</p>
    <section>
      <title>Viewing the Database Server Log Files</title>
      <p> Every database instance in Greenplum Database (master and segments) is a running
        PostgreSQL database server with its own server log file. Daily log files are created in the
          <codeph>pg_log</codeph> directory of the master and each segment data directory. </p>
      <p> The server log files are written in comma-separated values (CSV) format. Not all log
        entries will have values for all of the log fields. For example, only log entries associated
        with a query worker process will have the <codeph>slice_id</codeph> populated. Related log
        entries of a particular query can be identified by its session identifier
          (<codeph>gp_session_id</codeph>) and command identifier
        (<codeph>gp_command_count</codeph>).</p>
      <table>
        <tgroup cols="0">
          <colspec colwidth="10*" align="left"/>
          <colspec colwidth="20*" align="left"/>
          <colspec colwidth="20*" align="left"/>
          <colspec colwidth="35*" align="left"/>
          <thead>
            <row>
              <entry>
                <p> # </p>
              </entry>
              <entry>
                <p> Field Name </p>
              </entry>
              <entry>
                <p> Data Type </p>
              </entry>
              <entry>
                <p> Description </p>
              </entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>
                <p> 1 </p>
              </entry>
              <entry>
                <p> event_time </p>
              </entry>
              <entry>
                <p> timestamp with time zone </p>
              </entry>
              <entry>
                <p> Time that the log entry was written to the log </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 2 </p>
              </entry>
              <entry>
                <p> user_name </p>
              </entry>
              <entry>
                <p> varchar(100) </p>
              </entry>
              <entry>
                <p> The database user name </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 3 </p>
              </entry>
              <entry>
                <p> database_name </p>
              </entry>
              <entry>
                <p> varchar(100) </p>
              </entry>
              <entry>
                <p> The database name </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 4 </p>
              </entry>
              <entry>
                <p> process_id </p>
              </entry>
              <entry>
                <p> varchar(10) </p>
              </entry>
              <entry>
                <p> The system process id (prefixed with &quot;p&quot;) </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 5 </p>
              </entry>
              <entry>
                <p> thread_id </p>
              </entry>
              <entry>
                <p> varchar(50) </p>
              </entry>
              <entry>
                <p> The thread count (prefixed with &quot;th&quot;) </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 6 </p>
              </entry>
              <entry>
                <p> remote_host </p>
              </entry>
              <entry>
                <p> varchar(100) </p>
              </entry>
              <entry>
                <p> On the master, the hostname/address of the client machine. On the segment, the
                  hostname/address of the master. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 7 </p>
              </entry>
              <entry>
                <p> remote_port </p>
              </entry>
              <entry>
                <p> varchar(10) </p>
              </entry>
              <entry>
                <p> The segment or master port number </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 8 </p>
              </entry>
              <entry>
                <p> session_start_time </p>
              </entry>
              <entry>
                <p> timestamp with time zone </p>
              </entry>
              <entry>
                <p> Time session connection was opened </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 9 </p>
              </entry>
              <entry>
                <p> transaction_id </p>
              </entry>
              <entry>
                <p> int </p>
              </entry>
              <entry>
                <p> Top-level transaction ID on the master. This ID is the parent of any
                  subtransactions. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 10 </p>
              </entry>
              <entry>
                <p> gp_session_id </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Session identifier number (prefixed with &quot;con&quot;) </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 11 </p>
              </entry>
              <entry>
                <p> gp_command_count </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The command number within a session (prefixed with &quot;cmd&quot;) </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 12 </p>
              </entry>
              <entry>
                <p> gp_segment </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The segment content identifier (prefixed with &quot;seg&quot; for primaries or
                  &quot;mir&quot; for mirrors). The master always has a content id of -1. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 13 </p>
              </entry>
              <entry>
                <p> slice_id </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The slice id (portion of the query plan being run) </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 14 </p>
              </entry>
              <entry>
                <p> distr_tranx_id </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Distributed transaction ID </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 15 </p>
              </entry>
              <entry>
                <p> local_tranx_id </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Local transaction ID </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 16 </p>
              </entry>
              <entry>
                <p> sub_tranx_id </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Subtransaction ID </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 17 </p>
              </entry>
              <entry>
                <p> event_severity </p>
              </entry>
              <entry>
                <p> varchar(10) </p>
              </entry>
              <entry>
                <p> Values include: LOG, ERROR, FATAL, PANIC, DEBUG1, DEBUG2 </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 18 </p>
              </entry>
              <entry>
                <p> sql_state_code </p>
              </entry>
              <entry>
                <p> varchar(10) </p>
              </entry>
              <entry>
                <p> SQL state code associated with the log message </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 19 </p>
              </entry>
              <entry>
                <p> event_message </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Log or error message text </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 20 </p>
              </entry>
              <entry>
                <p> event_detail </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Detail message text associated with an error or warning message </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 21 </p>
              </entry>
              <entry>
                <p> event_hint </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Hint message text associated with an error or warning message </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 22 </p>
              </entry>
              <entry>
                <p> internal_query </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The internally-generated query text </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 23 </p>
              </entry>
              <entry>
                <p> internal_query_pos </p>
              </entry>
              <entry>
                <p> int </p>
              </entry>
              <entry>
                <p> The cursor index into the internally-generated query text </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 24 </p>
              </entry>
              <entry>
                <p> event_context </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The context in which this message gets generated </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 25 </p>
              </entry>
              <entry>
                <p> debug_query_string </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> User-supplied query string with full detail for debugging. This string can be
                  modified for internal use. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 26 </p>
              </entry>
              <entry>
                <p> error_cursor_pos </p>
              </entry>
              <entry>
                <p> int </p>
              </entry>
              <entry>
                <p> The cursor index into the query string </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 27 </p>
              </entry>
              <entry>
                <p> func_name </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The function in which this message is generated </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 28 </p>
              </entry>
              <entry>
                <p> file_name </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> The internal code file where the message originated </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 29 </p>
              </entry>
              <entry>
                <p> file_line </p>
              </entry>
              <entry>
                <p> int </p>
              </entry>
              <entry>
                <p> The line of the code file where the message originated </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> 30 </p>
              </entry>
              <entry>
                <p> stack_trace </p>
              </entry>
              <entry>
                <p> text </p>
              </entry>
              <entry>
                <p> Stack trace text associated with this message </p>
              </entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <p>Greenplum provides a utility called <codeph>gplogfilter</codeph> that can be used to search
        through a Greenplum Database log file for entries matching the specified criteria. By
        default, this utility searches through the Greenplum master log file in the default logging
        location. For example, to display the last three lines of the master log file:
        <codeblock>$ gplogfilter -n 3</codeblock></p>
      <p> You can also use <codeph>gplogfilter</codeph> to search through all segment log files at
        once by running it through the <codeph>gpssh</codeph> utility. For example, to display the
        last three lines of each segment log
        file:<codeblock>$ gpssh -f seg_host_file
  => source /usr/local/greenplum-db/greenplum_path.sh
  => gplogfilter -n 3 /data*/*/gp*/pg_log/gpdb*.csv</codeblock></p>
      <p> The following are the Greenplum security-related audit (or logging) server configuration
        parameters that are set in the postgresql.conf configuration file:</p>
      <table>
        <tgroup cols="0">
          <colspec colwidth="15*" align="left"/>
          <colspec colwidth="15*" align="left"/>
          <colspec colwidth="10*" align="left"/>
          <colspec colwidth="25*" align="left"/>
          <thead>
            <row>
              <entry>
                <p> Field Name </p>
              </entry>
              <entry>
                <p> Value Range </p>
              </entry>
              <entry>
                <p> Default </p>
              </entry>
              <entry>
                <p> Description </p>
              </entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>
                <p> log_connections </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> This outputs a line to the server log detailing each successful connection. Some
                  client programs, like psql, attempt to connect twice while determining if a
                  password is required, so duplicate “connection received” messages do not always
                  indicate a problem. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_disconnections </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> This outputs a line in the server log at termination of a client session, and
                  includes the duration of the session. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_statement </p>
              </entry>
              <entry>
                <p> NONE </p>
                <p> DDL </p>
                <p> MOD </p>
                <p> ALL </p>
              </entry>
              <entry>
                <p> ALL </p>
              </entry>
              <entry>
                <p> Controls which SQL statements are logged. DDL logs all data definition commands
                  like CREATE, ALTER, and DROP commands. MOD logs all DDL statements, plus INSERT,
                  UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE and EXPLAIN ANALYZE statements
                  are also logged if their contained command is of an appropriate type. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_hostname </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> By default, connection log messages only show the IP address of the connecting
                  host. Turning on this option causes logging of the host name as well. Note that
                  depending on your host name resolution setup this might impose a non-negligible
                  performance penalty. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_duration </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> Causes the duration of every completed statement which satisfies log_statement
                  to be logged. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_error_verbosity </p>
              </entry>
              <entry>
                <p> TERSE </p>
                <p> DEFAULT </p>
                <p> VERBOSE </p>
              </entry>
              <entry>
                <p> DEFAULT </p>
              </entry>
              <entry>
                <p> Controls the amount of detail written in the server log for each message that is
                  logged. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_min_duration_statement </p>
              </entry>
              <entry>
                <p> number of milliseconds, 0, -1 </p>
              </entry>
              <entry>
                <p> -1 </p>
              </entry>
              <entry>
                <p> Logs the statement and its duration on a single log line if its duration is
                  greater than or equal to the specified number of milliseconds. Setting this to 0
                  will print all statements and their durations. -1 disables the feature. For
                  example, if you set it to 250 then all SQL statements that run 250ms or longer
                  will be logged. Enabling this option can be useful in tracking down unoptimized
                  queries in your applications. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_min_messages </p>
              </entry>
              <entry>
                <p> DEBUG5 </p>
                <p> DEBUG4 </p>
                <p> DEBUG3 </p>
                <p> DEBUG2 </p>
                <p> DEBUG1 </p>
                <p> INFO </p>
                <p> NOTICE </p>
                <p> WARNING </p>
                <p> ERROR </p>
                <p> LOG </p>
                <p> FATAL </p>
                <p> PANIC </p>
              </entry>
              <entry>
                <p> NOTICE </p>
              </entry>
              <entry>
                <p> Controls which message levels are written to the server log. Each level includes
                  all the levels that follow it. The later the level, the fewer messages are sent to
                  the log. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p>log_rotation_size</p>
              </entry>
              <entry>0 - INT_MAX/1024 kilobytes</entry>
              <entry>1048576</entry>
              <entry>
                <p>When greater than 0, a new log file is created when this number of kilobytes have
                  been written to the log. Set to zero to disable size-based creation of new log files. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_rotation_age </p>
              </entry>
              <entry>
                <p> Any valid time expression (number and unit) </p>
              </entry>
              <entry>
                <p> 1d </p>
              </entry>
              <entry>
                <p> Determines the lifetime of an individual log file. When this amount of time has
                  elapsed since the current log file was created, a new log file will be created.
                  Set to zero to disable time-based creation of new log files. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_statement_stats </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> For each query, write total performance statistics of the query parser, planner,
                  and executor to the server log. This is a crude profiling instrument. </p>
              </entry>
            </row>
            <row>
              <entry>
                <p> log_truncate_on_rotation </p>
              </entry>
              <entry>
                <p> Boolean </p>
              </entry>
              <entry>
                <p> off </p>
              </entry>
              <entry>
                <p> Truncates (overwrites), rather than appends to, any existing log file of the
                  same name. Truncation will occur only when a new file is being opened due to
                  time-based rotation. For example, using this setting in combination with a
                  log_filename such as gpseg#-%H.log would result in generating twenty-four hourly
                  log files and then cyclically overwriting them. When off, pre-existing files will
                  be appended to in all cases. </p>
              </entry>
            </row>
          </tbody>
        </tgroup>
      </table>
    </section>
  </body>
</topic>
